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Abstract 

We present nonlinear photonic circuit models for constructing programmable linear 
transformations and use these to realize a coherent Perceptron, i.e., an all-optical linear 
classifier capable of learning the classification boundary iteratively from training data 
through a coherent feedback rule. Through extensive semi-classical stochastic simula¬ 
tions we demonstrate that the device nearly attains the theoretical error bound for a 
model classification problem. 
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1 Introduction 

Recent progress in integrated nanophotonic engineering [231 El ED ESI HH M, 1351 El ED HU 
has motivated follow-up proposals [ 251 [36lj of nanophotonic circuits for all-optical infor¬ 
mation processing. While most of these focus on implementations of digital logic, we 
present here an approach to all-optical analog, neuromorphic computation and propose 
design schemes for a set of devices to be used as building blocks for large scale circuits. 

Optical computation has been a long-time goal ED 03], with research interest surging 
regularly after new engineering capabilities are attained [32[ [331, but so far the parallel 
progress and momentum of CMOS based integrated electronics has outperformed all-optical 
devices. 

In recent years we have seen rapid progress in the domain of machine learning, and arti¬ 
ficial intelligence in general. Although most current ‘big data’-applications are realized on 
digital computing architectures, there is now an increasing amount of computation done in 
specialized hardware such as GPUs. Specialized analog computational devices for solving 
specific subproblems more efficiently than possible with either GPUs or general purpose 
computers are being considered or already implemented by companies such as IBM, Google 
and HP and in academia, as well. m EH SH ISH Specifically in the field of neuromor¬ 
phic computation, there has been impressive progress on CMOS based analog computation 
platforms [Di- 

Several neuromorphic approaches to use complex nonlinear optical systems for machine 
learning applications have recently been proposed ns ed on tni and some initial schemes 
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have been implemented [ME2]. So far, however, all of these ‘optical reservoir computers’ 
have still required digital computers to prepare the inputs and process the output of these 
devices with the optical systems only being employed as static nonlinear mappings for 
dimensional lifting to a high dimensional feature space m , in which one then applies 
straightforward linear regression or classification for learning an input-output map. [53] 

In this work, we address how the final stage of such a system, i.e., the linear classifier 
could be realized all-optically. We provide a universal scheme, i.e., independent of which 
particular kind of optical nonlinearity is employed, for constructing tunable all-optical, 
phase-sensitive amplifiers and then outline how these can be combined with self-oscillating 
systems to realize an optical amplifier with programmable gain, i.e., where the gain can be 
set once and is then fixed subsequently. 

Using these as building blocks we construct an all-optical perceptron [39[30j, a system 
that can classify multi-dimensional input data and, using pre-classified training data learn 
the correct classification boundary ‘on-line’, i.e., incrementally. The perceptron can be seen 
as a highly simplified model of a neuron. While the idea of all-optical neural networks has 
been proposed before m and an impressive scheme using electronic, measurement-based 
feedback for spiking optical signals has been realized m, to our knowledge, we offer the 
first complete description for how the synaptic weights can be stored in an optical memory 
and programmed via feedback. 

The physical models underlying the employed circuit components are high intrinsic-Q 
optical resonators with strong optical nonlinearities. For theoretical simplicity we assume 
resonators with either a x:i or a X 3 nonlinearity, but the design can be adapted to depend 
on only one of these two or alternative nonlinearities such as those based on free carrier 
effects or optomechanical interactions. 

The strength of the optical nonlinearity and the achievable Q-factors of the optical 
resonators determine the overall power scale and rate at which a real physical device could 
operate. Both a stronger nonlinearity and higher Q allow operating at lower overall power. 

We present numerical simulations of the system dynamics based on the semi-classical 
Wigner-approximation to the full coherent quantum dynamics presented in m- For photon 
numbers as low as (~ 10 — 20) this approximation allows us to accurately model the effect 
of optical quantum shot noise even in large-scale circuits. 

In the limit of both very high Q and very strong nonlinearity, we expect quantum ef¬ 
fects to become significant as entanglement can arise between the field modes of physically 
separated resonators. In the appendix, we provide full quantum models for all basic com¬ 
ponents of our circuit. The possibility of a quantum speedup is being addressed in ongoing 
work. Recently, D-Wave Systems has generated a lot of interest in their own superconduct¬ 
ing qubit based quantum annealer. Although the exact benefits of quantum dynamics in 
their machines has not been conclusively established (0], recent results analyzing the role 
of tunneling in a quantum annealer [3J are intriguing and suggest that quantum effects can 
be harnessed in computational devices that are not unitary quantum computers. 

1.1 The Perceptron algorithm 

The perceptron is a machine learning algorithm that maps an input x £ to a single 
binary class label y w [x ] £ {0,1}. Binary classifiers generally operate by dividing the input 
space into two disjoint sets and identifying these with the class labels. The perceptron is a 
linear classifier, meaning that the surface separating the two class label sets is a linear space, 
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a hyperplane, and its output is computed simply by applying a step function Q(u) := l n >o 
to the inner product of a single data point x with a fixed weight vector w: 


y w [x] := 6 (w T x ) 


1 for w T x > 0, 
0 otherwise. 


( 1 ) 


Geometrically, the weight vector w parametrizes the hyperplane {z E M n : w T z = 0} that 
forms the decision boundary. 

In the above parametrization the decision boundary always contains the origin z = 0, 
but the more general case of an affine decision boundary {z E M n : w T z = b} can be 
obtained by extending the input vector by a constant z = (z T , 1) T E M n+1 and similarly 
defining an extended weight vector w = (w T , ~b) T . 

The perceptron converges in a finite number of steps for all linearly separable problems 
IffiT by randomly iterating over a set of pre-classified training data {(y 1 ' 1 ' 1 ,x&) G {0,1}G 
M n , j = 1,2 ,,M} and imparting a small weight correction w — > w + Aw for each falsely 
classified training example x^> 

Aw = a — y w [x^]^ x^\ (2) 

The learning rate a > 0 determines the magnitude of the correction applied for each training 
example. The expression in parentheses can only take on the values {0,-1,1} with the zero 
corresponding to a correctly classified example and the non-zero values corresponding to 
the two different possible classification errors. 

Usually there exist many separating hyperplanes for a given linear binary classification 
problem. The standard perceptron is only guaranteed to find one that works for the training 
set. It is possible to introduce a notion of optimality to this problem by considering the 
minimal distance (“margin”) of the training data to the found separating hyperplane. Max¬ 
imization of this margin naturally leads to the “support vector machine” (SVM) algorithm 
[9j. Although the SVM outperforms the perceptron in many classification tasks it does 
not lend itself to a hardware implementation as readily because it cannot be trained incre¬ 
mentally. It is this that makes the perceptron algorithm especially suited for a hardware 
implementation: We can convert the discrete update rule ([2]) to a differential equation 

w(t) = a {y(t) - y w( t)(t)} x{t), (3) 

and then construct a physical system that realizes these dynamics. In this continuous-time 
version the inputs are piece-wise constant x(t) = x^*\ y{t) = y^ and take on the same 
discrete values as above indexed by jt := E {1, 2,..., M = 


1.2 The circuit modeling framework 

Circuits are fully described via Quantum Hardware Description Language (QHDL) |49j 
based on Gough and James’ SLH-framework [201119]. To carry out numerical simulations 
for large scale networks, we derive a system of semi-classical Langevin equations based 
on the Wigner-transformation as described in m- Note that there is a perfect one-to-one 
correspondence between nonlinear cavity models expressed via SLH and the Wigner method 
as long as the nonlinearities involve only oscillator degrees of freedom. There is ongoing 
research in our group to establish similar results for more general nonlinearities |22j . 
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Both the Wigner method and the more general SLH framework can be used to model 
networks of quantum systems where the interconnections are realized through bosonic quan¬ 
tum fields. The SLH framework describes a system interacting with n independent input 
fields in terms of a unitary scattering matrix S parametrizing direct field scattering, a cou¬ 
pling vector L = (Li, L 2 ,..., L n ) T parametrizing how external fields couple into the system 
and how the system variables couple to the output and a Hamilton operator inducing the 
internal dynamics. We summarize these objects in a triplet ( S,L,H ). L and H are suffi¬ 
cient to parametrize any Schrodinger picture simulation of the quantum dynamics, e.g., the 
master equation for a mixed system state p is given by 

p = -i[H,p\ + 'jr (L jP L t - ^{L^Lj.pA . (4) 

3 =1 V J 

The scattering matrix S is important when composing components into a network. In 
particular, the input-output relation in the SLH framework is given by 

<L4 ou t = S dA in + L dt, (5) 

where the dA in / out j, 3 = 1)2,... ,n are to be understood as quantum stochastic processes 
whose differentials can be manipulated via a quantum Ito calculus [20] . The Wigner method 
provides a simplified, approximate description which is valid when all non-linear resonator 
modes are in strongly displaced states m- The simulations presented here were carried out 
exclusively at energy scales for which the Wigner method is valid, allowing us to scale to 
much larger system sizes than we could in a full SLH-based quantum simulation. This is 
because the computational complexity of the Wigner method scales at most quadratically 
(and in sparsely interconnected systems nearly linearly) with the number of components as 
opposed to the exponential state space scaling of a quantum mechanical Hilbert space. We 
nonetheless provide our models in both Wigner-method form and SLH form in anticipation 
that our component models will also be extremely useful in the full quantum regime. 

In the Wigner-based formalism, a system is described in terms of time-dependent com¬ 
plex coherent amplitudes a(t) = («i(t), ot 2 (t), • ■ •, a m .(t)) T for the internal cavity modes 
and external inputs (3i n (t) = ■ ■ ■, Pin,n{t)) T . These amplitudes relate to 

quantum mechanical expectations as (op) ~ (oj)qm: where (•) denotes the expectation 
with respect to the Wigner quasi distribution and (-)qm a quantum mechanical expectation 
value. See m for the corresponding relations of higher order moments. 

To simplify the analysis, we exclusively work in a rotating frame with respect to all 
driving fields. As in the SLH case we define output modes /3 ou t(t) that are algebraically 
related to the inputs and the internal modes. The full dynamics of the internal and external 
modes are then governed by a multi-dimensional Langevin equation 

a{t) = [A a(t) + a + A NL (o;, t)] + B/3 in (f), (6) 

as well as a purely algebraic, linear input-output relationship 

/3out(t) = [C a(t) + c] + D/3in(t). (7) 

The complex matrices A,B,C,D as well as the constant bias input vectors a and c 
parametrize the linear dynamics, whereas the function gives the nonlinear contri¬ 

bution to the dynamics of the internal cavity modes. 
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Each input consists of a coherent, deterministic part and a stochastic contribution 
An j (t) = An ,j(t) + rij{t). The stochastic terms rjj(t ) = r/yi (t) + irjj. 2 (t) are assumed to be in¬ 
dependent complex Gaussian white noise processes with correlation function (ijj,s{t)rik,r{t')) = 
\5 jk S sr 5(t - t'). 

The linearity of the input-output relationship in either framework ([5]) and ([t]) in the ex¬ 
ternal degrees of freedom leads to algebraic rules for deriving reduced models for whole cir¬ 
cuits of nonlinear optical resonators by concatenating component models and algebraically 
solving for their interconnections. HUSH To see the basic component models used in this 
work see Appendix [A| Netlists for composite components and the whole circuit will be made 
available at m- 


2 The Coherent Perceptron Circuit 

The full perceptron’s circuit is visualized in Figure [lj The input data x to the perceptron 



Figure 1: An example perceptron circuit consisting of iV = 4 programmable amplifiers for 
the coherent input vector x = (aq, X 2 , £ 3 , X 4 ) T , a static mixing element that sums their 
output, a quadrature filter to remove the imaginary quadrature and a final thresholding 
element to generate the estimated binary class label y. The additional binary input T 
controls whether the system is in training mode, in which case the estimated class label 
y is compared to the true class label Y which is provided as an additional input. When 
they differ, the programmable amplifiers receive a feedback signal to adjust their internal 
weights. 

circuit is encoded in the real quadrature of N coherent optical inputs. Equation ([ 3 ]) informs 
us what circuit elements are required for a hardware implementation by decomposing the 
necessary operations: 

1. Each input Xj is multiplied by a weight Wj. 

2. The weighted inputs are coherently added. 
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3. The sum drives a thresholding element to generate the estimated class label y. 

4. In the training phase (input T = 1) the estimated class label y is compared with the 
true class label (input Y) and based on the outcome, feedback is applied to modify 
the weights {wj}. 


The most crucial element for this circuit is the system that multiplies an input xj with a 
programmable weight Wj. This not only requires having a linear amplifier with tunable gain, 
but also a way to encode and store the continuous weights Wj. In the following we outline 
one way how such systems can be constructed from basic nonlinear optical cavity models: 
Section 2.1 presents an elegant way to construct a phase sensitive linear optical amplifier 
where the gain can be tuned by changing the amplitude of a bias input. In Section [2.2| we 
propose using an above threshold non-degenerate optical parametric amplifier to store a 
continuous variable in the output phase of the signal (or idler) mode. In Section 2.3 these 
systems are combined to realize an optical amplifier with programmable gain, i.e., a control 
input can program its gain, which then stays constant even after the control has been turned 
off. Finally, we present a simple model for all-optical switches based on a cavity with two 
modes that interact via a cross-Kerr-effect in Section \2 .41 This element is used both for the 
feedback logic as well as the thresholding function to generate the class label y. 


2.1 Tunable Gain Kerr-amplifier 

A single mode Kerr-nonlinear resonator driven by an appropriately detuned coherent drive 
e can have a strongly nonlinear dependence of the intra-cavity energy on the drive power. 
When the drive of a single resonator is given by the sum of a constant large bias amplitude 
and a small signal e = ^( e o + 5e), the steady state reflected amplitude is e' = ~^{v e o + 
g-(eo)Se + g + (eo)5e*) + 0(5e 2 ), where |??| < 1 with equality for the ideal case of negligible 
intrinsic cavity losses. The small signal thus experiences phase sensitive gain dependent on 
the bias amplitude and phase. We provide analytic expressions for the gain in Appendix 

lA2P~l 

Placing two identical resonators in the arms of an interferometer allows for isolating 
the signal and bias outputs even if their amplitudes vary by canceling the scattered bias 
in one output and the scattered signal in the other (cf. Figure [2]). This highly symmetric 
construction, which generalizes to any other optical nonlinearity, ensures that the the signal 
output is linear in 5e up to third ordeiQ If the system parameters are well-chosen, the 
amplifier gain depends very strongly on small variations of the bias amplitude. This allows 
to tune the gain from close to unity to its maximum value, which, for a given waveguide 
coupling k and Kerr coefficient y depends on the drive detuning from cavity. For Kerr- 
nonlinear resonators there exists a critical detuning beyond which the system becomes 
bi-stable and exhibits hysteresis. This can be used for thresholding type behavior though 
as shown in [46] in this case it may be advantageous to reduce the symmetry of the circuit. 
It is convenient to engineer the relative propagation phases such that at maximum gain, 
a real quadrature input signal i£K leads to an amplified output signal x' = g^^x with 
no imaginary quadrature component (other than noise and higher order contributions). 
However, for different bias input amplitudes and consequently lower gain values the output 
will generally feature a linear imaginary quadrature component x' = [g r r(e o) + igi r {t o)] % as 

x One can easily convince oneself that all even order contributions are scattered into the bias output. 
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(a) Amplifier circuit (b) Gain vs. bias 


Figure 2: (a) shows two identical single mode Kerr-nonlinear optical resonators symmetri¬ 
cally placed in the two arms of an interferometer, (b) gives the phase sensitive amplifier gain 
g rr (e o) (green, solid) and the gi r (e o) (red, dashed) as a function of the bias photon input 
rate normalized by the drive power at which dynamic resonance occurs. For completeness 
we also provide g r i (black X’s) and ga (black dots). The detuning has been chosen such 
that g™ ax = fl , rr(eo iax ) = 20. The dashed blue envelope gives the maximal input output 
gain achievable between any two signal quadratures at that bias. Note that g rr vanishes at 
eo/e™ ax « 0.8. 


well. Figure [2(b)| demonstrates this for a particular choice of maximal gain. We note that 
there exist previous proposals of using nonlinear resonator pairs inside interferometers to 
achieve desirable input-output behavior [36], but to our knowledge, no one has proposed 
using these for signal/bias isolation and tunable gain. To first order the linearized Kerr 
model is actually identical to a sub-threshold degenerate OPO model. This implies that it 
can be used to generate squeezed light and also that one could replace the Kerr-model by 
an OPO model. 

An almost identical circuit, but featuring resonators with additional internal loss equal 
to the wave-guide coupling and constantly biased to dynamic resonance (|a| 2 ) ss = — A/x 
can be used to realize a quadrature filter, i.e., an element that has unity gain for the real 
quadrature and zero for the imaginary one. Now the quadrature filtered signal still has an 
imaginary component, but to linear order this only consists of transmitted noise from the 
additional internal loss. While it would be possible to add one of these downstream of every 
tunable Kerr amplifier, in our specific application it is more efficient to add just a single one 
downstream of where the individual amplifier outputs are summed (cf. Section 
also reduces the total amount of additional noise introduced to the system. 

2.2 Encoding and Storing the Gain 

In the preceding section we have seen how to realize a tunable gain amplifier, but for 
programming and storing this gain (or equivalently its bias amplitude) an additional com¬ 
ponent is needed. Although it is straightforward to design a multi-stable system capable of 
outputting a discrete set of different output powers to be used as the amplifier bias, such 
schemes would likely require multiple nonlinear resonators and it would be more cumber¬ 
some to drive transitions between the output states. 

2 In the photonics community this is referred to as critically coupled , whereas the amplifier circuit would 
ideally be strongly overcoupled such that additional internal losses are negligible. 


2.5). This 
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An alternative to such schemes is given by systems that have a continuous set of stable 
states. Recent analysis of continuous time recurrent neural network models trained for com¬ 
plex temporal information processing tasks has revealed multi-dimensional stable attractors 
in the internal network dynamics that are used to store information over time. [05] 

A simple semi-classical nonlinear resonator model to exhibit this is given by a non¬ 
degenerate optical parametric oscillator (NOPO) pumped above threshold; for low pump 
input powers this system allows for parametric amplification of a weak coherent signal (or 
idler) input. In this case vacuum inputs for the signal and idler lead to outputs with zero 
expected photon number. Above a critical threshold pump power, however, the system 
down-converts pump photons into pairs of signal and idler photons. 


Due to an internal 17(1) symmetry of the underlying Hamiltonian (cf. Appendix A.2.3), 
the signal and idler modes spontaneously select phases that are dependent on each other 
but independent of the pump phase. This implies that there exists a whole manifold of 
fix-points related to each other via the symmetry transformation (a s , oti) —> ( a s e u aje - *^), 
where a s and a* are the rotating frame signal and idler mode amplitudes, respectively. 
Consequently the signal output of an above threshold NOPO lives on a circular manifold 
(cf Figure [3]). 



(a) Combined bias (b) Gain vs. OPO phase 

Figure 3: The NOPO’s signal output £ = \fKot s lives on a circular manifold parametrized by 
<h (a, upper figure). Vacuum input shot noise leads to small fluctuations perpendicular to 
the manifold and diffusion along it. Mixing this signal output with a constant bias offset on 
a beamsplitter produces two outputs with anti-correlated total amplitude (a, lower figure). 
When both outputs are used to drive a complementary pair of tunable amplifiers whose 
outputs are subtracted, the overall real-to-real quadrature gain (green) of the system varies 
from positive to negative values (b). We can also see that the real-to-imaginary gain (dashed 
red) stays small for all NOPO phases, which allows us to efficiently subtract it downstream 
by the quadrature filter. The imaginary to real and imaginary gains are also plotted. 


Vacuum shot noise on the inputs leads to phase diffusion with a rate of 7 $ = , where 

k is the signal and idler line width and no is the steady state intra cavity photon number in 
either mode. We point out that this diffusion rate does not directly depend on the strength 
of the nonlinearity which only determines how strongly the system must be pumped to 
achieve a given intra cavity photon number uq. 



















A weak external signal input breaks the symmetry and biases the signal output phase 
towards the external signal’s phase. This allows for changing the programmed phase value. 

Finally, we note that parametric oscillators can also be realized in materials with van¬ 
ishing X '2 nonlinearity. They have been successfully realized via four-wave mixing (i.e., 
exploiting a X3 nonlinearity) in [231 42- [12] and even in opto-mechanical systems [8] in 
which case the idler mode is given by a mechanical degree of freedom. 

In principle any nonlinear optical system that has a stable limit cycle could be used to 
store and encode a continuous value in its oscillation phase. Non-degenerate parametric 
oscillators stand out because of their theoretical simplicity allowing for a ‘static’ analysis 
inside a rotating frame. 


2.3 Programmable Gain Amplifier 


Combining the circuits described in the preceding sections allows us to construct a fully 
programmable phase sensitive amplifier. In Figure [2(b)| we see that there exists a particular 
bias amplitude at which the real to real quadrature gain vanishes g rr (eo 1 ™) = 0. We combine 
the NOPO signal output £ = re 1 ® with a constant phase bias input £o (cf. Figure 3(a)) 
on a beamsplitter such that the outputs vary between zero gain and the maximal gain bias 


values 


go± re 1 ' 


V2 

output of that 


e k 


min ,maxi 


To realize both positive and negative gain, we use the second 


o i c o 

reamsplitter to bias another tunable amplifier. The two amplifiers are always 
biased oppositely meaning that one will have maximal gain when the other’s gain vanishes 
and vice versa. The overall input signal is split and sent through both amplifiers and then 
re-combined with a relative n phase shift. This complementar y setu p leads to an overall 
effective gain tunable within G rr (&) E [—= 


(cf. Figure 3(b)). 


In Figure [4] we present both the complementary pair of amplifiers and the NOPO used 
for storing the bias as well as some logic elements (described in Section 2.4) used for imple¬ 
menting conditional training feedback. We call the full circuit a synapse because it features 
programmable gain and implements the perceptron’s conditional weight update rule. 



Figure 4: Synapse circuit composed of a programmable amplifier and feedback logic (cf. 
Section 2.4) that implements the perceptron learning feedback Q for a single weight. The 
upper amplifier when biased optimally leads to positive gain whereas the lower amplifier 
leads to negative gain due to the additional ir phase shift. 


The resulting synapse model is quite complex and certainly not optimized for a minimal 
component number but rather the ease of theoretical analysis. A more resource efficient 
programmable amplifier could easily be implemented using just two or three nonlinear 
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resonators. E.g., inspecting the the real to imaginary quadrature gain gi r (e o) in Figure |2(b)] 
we see that close to e™ ax it passes through zero fairly linearly and with an almost symmetric 
range. This indicates that we could use a single tunable amplifier to realize both positive 
and negative gain. Using only a single resonator for the tunable amplifier could work as 
well, but it would require careful interferometric bias cancellation and more tedious upfront 
analysis. We do not think it is feasible to use just a single resonator for both the parametric 
oscillator and the amplifier because any amplified input signal would have an undesirable 
back-action on the oscillator phase. 

2.4 Optical Switches 

The feedback to the perceptron weights (cf. Equation ([3])) is conditional on the binary 
values of the given and estimated class labels y and y, respectively. The logic necessary 
for implementing this can be realized by means of all-optical switches. There have been 
various proposals and demonstrations [371135] of all-optical gates/switches and quantum 
optical switches [30]. 



signal in 2 signal out 2 



Fredkin gate circuit symbol 
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Thresholder input s = /2c—s 0 a.u. 


(a) Fredkin gate and thresholder 


(b) Thresholder input/output 


Figure 5: In the upper graphic of (a) we present a schematic for Fredkin gate based on a 
two mode cross-Kerr-nonlinear resonator. The lower graphic shows how this circuit can be 
pre-pended with a single mode nonlinear resonator to better approximate a thresholding 
response. In (b) we present the input output characteristic of the prepended resonator 
(upper left), the Fredkin gate (upper right) and the combined input output relationship 
between the inner product amplitude s and the estimated state label y. 

The model that we assume here (cf. Figure [5]) is to use two different modes of a resonator 
that interact via a cross-Kerr-effect, i.e., power in the control mode leads to a refractive 
index shift (or detuning) for the signal mode. The index shift translates to a control 
mode dependent phase shift of a scattered signal field yielding a controlled optical phase 
modulator. Wrapping this phase modulator in a Mach-Zehnder interferometer then realizes 
a controlled switch: If the control mode input is in one of two different states |£| G 0, £o, the 
signal inputs are either passed through or switched. This operation is often referred to as 
a controlled swap or Fredkin gate m which was originally proposed for realizing reversible 
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computation. This dispersive model has the advantage that the control input signal can be 
reused. 

Note that at control input amplitudes significantly different from the two control levels 
the outputs are coherent mixtures of the inputs, i.e., the switch then realizes a tunable 
beamsplitter. 

Finally, we point out that using two different (frequency non-degenerate) resonator 
modes has the advantage that the interaction between control and signal inputs is phase 
insensitive which greatly simplifies the design and analysis of cascaded networks of such 
switches. 


2.5 Generation of the Estimated Label 


The estimated classifier label y should be a step function applied to the inner product of 
the weight vector and the input. In the preceding sections we have shown how individual 
inputs Xj can be amplified with programmable gain to give Sj = G(&j)xj, thus realizing the 
individual contributions to the inner product. These are then summed on an n-port beam¬ 
splitter that has an output which gives the uniformly weighted sum s := 

The gain factors G(&k) = G rr (&k) + iGi r (<&k) generally have an unwanted imaginary 
part which we subtract by passing the summed output through a quadrature filter circuit (cf. 


the last paragraph of Section 2.1), which has unit gain for the real quadrature and zero gain 
for the imaginary quadrature leading to an overall output s = Res = J2k=i G r r{&k)xk- 
The thresholding circuit should now produce a high output if s > 0 and a zero output if 
s < 0. 

It turns out that the optical Fredkin gate described in the previous section already works 
almost as a two mode thresholder, where the control input leads to a step-like response in 
the signal outputs: A constant signal input amplitude which encodes the logical T’ state is 
applied to one of the signal inputs. When the control input amplitude is varied from zero 
to £o> the signal output turns on fairly abruptly at some threshold £ t h < Co- To make the 
thresholding phase sensitive, the control input is given by the sum of s and a constant offset 
so that provides a phase reference: c = ^(s + so). 

For a Fredkin gate operated with continuous control inputs the signal output is almost 
zero for a considerable range of small control inputs. However, for very high control inputs, 
i.e., significantly above Co> the signal output decreases instead of staying constant as would 
be desirable for a step-function like profile. We found that this issue can be addressed by 
transmitting the control input through a single mode Kerr-nonlinear cavity, with resonance 
frequency chosen such that the transmission gain \d/c\ is peaked close to d = £o- For 
input amplitudes larger than c, the transmission gain is lower (although \d\ still grows 
monotonically with |c|) which extends the input range over which the subsequent Fredkin 
gate stays in the on-state. 


3 Results 

The perceptron’s SDEs where simulated using a newly developed custom software package 
named QHDLJ [28] implemented in Julia [3] which allows allows for dynamic compilation 
of circuit models to LLVM [25] bytecode that runs at speed comparable to C/C++. All 
individual simulations can be carried out on a laptop, but the results in Figure [8] were 
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obtained by averaging over the results of 100 stochastic simulation run on an HP ProLiant 
server with 80 cores. The current version of QHDLJ uses one process per trajectory, but 
the code could easily be vectorized. 

In Figure [6] we present an example of a single application of an JV = 8 perceptron 
including both a learning stage with pre-labeled training data and a classification testing 
stage in which the perceptron’s estimated class labels are compared with their correct 
values. The data to be classified here are sampled from a different 8— dimensional Gaussian 
distribution for each class label with their mean vectors separated by a distance \\fii — 
Hoh/v = 2 relative to the standard deviation of both individual clusters. For each sample 
the input was held constant for a duration At = where k is the NOPO signal and 

idler line width. The perceptron was first trained with Mt ra i n = 100 training examples and 
subsequently tested on M tes t = 100 test examples with the learning feedback turned off. 

In Figure [7] we visualize linear projections of the testing data as well as the estimated 
classification boundaries. We can see that the classifier performs very well far away from 
the decision boundary. Close to the decision boundary there are some misclassified exam¬ 
ples. We proceed to compare the performance of the classifier to the theoretically optimal 
performance achievable by any classifier and with the optimal classifier for this scenario, 
Gaussian Discriminant Analysis (GDA) [T51129) . implemented in software. Using the iden¬ 
tical perceptron model as above and an identical training/testing procedure, we estimate 
the error rate p e rr = IP[y ^ y] of the trained perceptron as a function of the cluster sep¬ 
aration II/ji — yo\ I 2 /CT. The results are presented in Figure 8(a) Identically distributed 


training and testing data was used to evaluate the performance of the GDA algorithm 
and both results are compared to the theoretically optimal error rate for this discrimina¬ 
tion task, which can be computed analytically to be p e rr,optim. = ^erfc ( ^ 1 v ^ r °^ 2 ) > where 

erfc(x) = f^° e~ u ~du is the complementary error function. We see that the all-optical 

perceptron’s performance is comparable to GDA’s performance for this problem and both 
algorithms attain performance close to the theoretical optimum. 

The learning rate of the perceptron is determined by two things, the overall strength of 
the learning feedback as well as the time for which each example is presented to the circuit. 
In Figure [8(b) | we plot the estimated error rate for varying feedback strength and duration. 
As can be expected intuitively, we find that there are trade-offs between speed (smaller At 
preferable) and energy consumption (smaller a preferable). 


3.1 Time scales and power budget 

Here we roughly estimate the power consumption of the whole device and discuss how to 
scale it up to a higher input dimension. 

Any real-world implementation will depend strongly on the engineering paradigm, i.e., 
the choice of material/nonlinearity as well as the engineering precision, but based on recently 
achieved progress in nonlinear optics we will estimate an order of magnitude range for the 
input power. 

The signal and feedback input power to the circuit will scale linearly in the number of 
synapses N. 

The bias inputs for the amplifiers has to be larger than the signal to ensure linearly 
operation, but it should be expected that some of the scattered bias amplitudes can be 
reused to power multiple synapses. 
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Figure 6: Single trajectory divided into a training interval 0 < t < Aftrain At during which 
the learning feedback is active and a test interval Aftrain At < t < Af tes tAf. During training 
and testing, respectively, the system is driven by Af tra in = Attest = 100 separate input 
states which are held constant for an interval At = 2k _1 . The estimated class label is 
discretized by averaging the output intensity over each input interval, dividing the result 
by the intensity |£| 2 corresponding to the logical T’ output state and rounding. The upper 
panel compares the correct class label y (green) with the estimated class label y (black) 
during training and testing, respectively. The area between them indicates errors or at 
least lag of the estimator and is shaded in light red. The second panel shows occurrences of 
classification errors (red vertical bars). The slight shading near the beginning and the end 
of the trajectory in the second panel visualizes the segments corresponding to the upper 
left and right panel, respectively. The third panel shows the learned linear amplitude gains 
for each synapse. After the learning feedback is turned off at t = M tra i n At, they diffuse 
slightly due to optical shot noise. 
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Figure 7: Projection of training data and classification boundaries. The data has been 
rotated such that the si coordinate lines up with the learned normal vector of the separating 
hyperplane. Incorrectly classified data are plotted in red. The faint blue (red) lines visualize 
the evolution of the classifier boundary during training (testing). 


In our models we have defined all rates relative to the line width of the signal and idler 
mode of the NOPO, because this is the component that should necessarily have the smallest 
decay rate to ensure a long lifetime for the memory. 

All other resonators are employed as nonlinear input-output transformation devices and 
therefore a high bandwidth (corresponding to much lower loaded quality factor) is necessary 
for achieving a high bit rate. For our simulations we typically assumed quality factors that 
were lower than the NOPO’s by 1-2 orders of magnitude. 

Based on self-oscillation threshold powers reported in [23], 12, 26, |38j and the switching 
powers of [35] we estimate the necessary power per synapse to be in the range of ~ 10 — 
100/xWatt. By re-using the scattered pump and bias fields it should be possible to reduce 
the power consumption per amplifier even further. Even for the continuous wave signal 
paradigm we have assumed (as opposed to pulsed/spiking signals such as considered in 
133) the devices proposed here could be competitive with the current state of the art 
CMOS-based neuromorphic electrical circuits [6j. 

In the simulations for the 8—dimensional perceptron our input rate for training data 
was set to At" 1 = |. This value corresponds to roughly ten times the average feedback 
delay time between arrival of an input pattern and the conditional switching of the feedback 
logic upon arrival of the generated estimated state label y. This time can be estimated as 
Tfb(n ) ~ G max^,4 T T O.hr'esii ^ u k p , where ti is the index of the synaptic weight, G max 
is the amplifier gain range and ka, kqf, thresh and kf are the line widths of the amplifier, 
quadrature filter, the combined thresholding circuit (cf. Figure [5]) and the feedback Fredkin 
gates. There is a contribution scaling with n because the feedback traverses the individual 
weights sequentially to save power. 
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(a) Error rate vs. hardness 
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(b) Error rate vs. learning paramaters 


Figure 8: The perception’s error rate vs the difficulty of the classification task and as a 
function of the parameters determining the learning rate. In Figure (a) we compare the 
unoptimized performance of the perceptron circuit (red diamonds) to the optimal perfor¬ 
mance bound (solid, green) as well as a GDA (blue X’s) trained on the same number of 
training examples. We show averages over 100 trials at each cluster separation. The GDA 
data was similarly averaged over 100 trials. The transparent envelopes indicate the sample 
standard deviation. The black dots show the perceptron performance when simulated with¬ 
out shot noise. We see that the shot noise has very little effect. In Figure (b) we plot the 
average error rate (averaged over 50 trials) at fixed cluster separation ||/^i — Mo 11 2 /^ = 2 for 
various values of the time interval At for which each data sample is presented to the circuit 
as well as the strength of the training feedback a. The total number of feedback photons 
Nfb = | a| 2 At per sample is constant along the faint dashed lines and the actual value is 
indicated on the right. A good choice of parameters is characterized both by low feedback 
power (small |a| 2 ) and high input rate (low sample time At) while still resulting in a low 
classification error rate. The X marks the parameters used for the results in (a) and the 
previous Figures. 
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When scaling up the perceptron to a higher dimension while retaining approximately 
the same input signal powers, it is intuitively clear that the combined ‘inner product’ signal 
amplitude s scales as s k y/Nsi, where si is the signal amplitude for a single input. 
This allows to similarly scale up the amplitude Co of the signal encoding the generated 
estimated state label y and consequently the bandwidth of the feedback Fredkin gates that 
it drives. A detailed analysis reveals that the Fredkin gate threshold scales as s/N, in 
particular we find that y/fxICo oc k f oc vTxlCo oc thresh oc y1xi s y/N\x\si- The 
first two scaling relationships are due to the constraints on the Fredkin gate construction 
(cf. Appendix A.2.2), the next two scaling relationships follow from demanding that the 
additional thresholding resonator be approximately dynamically resonant at the highest 
input level (cf. Appendices A.2.1 and A.2.2). The last proportionality is simply due to the 
amplitude summation at the JV-port beamsplitter. 

This reveals that when increasing N the perceptron as constructed here would have to 
be driven at a lower input bit rate scaling as At -1 oc N~ 2 or alternatively be driven with 
higher signal input powers. A possible solution that could greatly reduce the difference 


in arrival time ~ hip 1 at each synapse could be to increase the waveguide-coupling to the 


control signal and thus decrease the delay per synapse. The resulting increase in the required 
control amplitude Co can be counter-acted with feedback, i.e., by effectively creating a large 
cavity around the control loop. When even this strategy fails one could add fan-out stages 
for y which introduce a delay that grows only logarithmically with N. 

Finally, we note that the bias power of all the Kerr-effect based models considered here 
scales inversely with the respective nonlinear coefficient {| Co 1 2 ? I s 1 2 } x Ixl ~ const when 
keeping the bandwidth fixed. This implies that improvements in the non-linear coefficient 
translate to lower power requirements or alternatively a faster speed of operation. 


4 Conclusion and Outlook 

In conclusion we have shown how to design an all-optical device that is capable of super¬ 
vised learning from input data, by describing how tunable gain amplifiers with signal/bias 
isolation can be constructed from nonlinear resonators and subsequently combined with self- 
oscillating resonators to encode the programmed amplifier gain in their oscillation phase. 
By considering a few additional nonlinear devices for thresholding and all-optical switching 
we then show how to construct a perceptron, including the perceptron feedback rule. To 
our knowledge this is the first end-to-end description of an all-optical circuit capable of 
learning from data. We have furthermore demonstrated that despite optical shot-noise it 
nearly attains the performance of the optimal software algorithm for the classification task 
that we considered. Finally, we have discussed the relevant time-scales and pointed out 
how to scale the circuit up to large input dimensions while retaining the signal processing 
bandwidth and a low power consumption per input. 

Possible applications of an all-optical perceptron are as the trainable output filter of an 
optical reservoir computer or as a building block in a multi-layer all-optical neural network. 

The programmable amplifier could be used as a building block to construct other learning 
models that rely on continuously tunable gain such as Boltzmann machines and hardware 
implementations of message passing algorithms. 

An interesting next step would be to design a perceptron that can handle inputs at 
different carrier frequencies. In this case wavelength division multiplexing (WDM) might 
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allow to significantly reduce the physical footprint of the device. 

A simple modification of the perceptron circuit could autonomously learn to invert linear 
transformations that were applied to its input signals. This could be used for implementing 
a circuit capable of solving linear regression problems. In combination with a multi-mode 
optical fibers such a device could also have applications for all-optical sensing. 

Finally, an extremely interesting question is whether harnessing quantum dynamics 
could lead to a performance increase. We hope to address these ideas in future work. 
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A Basic Component Models 

Here we present the component models used to build the perceptron circuit. We will first 
describe the static components such as beamsplitters, phase shifts and coherent displace¬ 
ments, then proceed to describe the different Kerr-nonlinear models and finally the NOPO 
model. 


A.l Static, Linear Circuit Components 

All of these components have in common that they have no internal dynamics, implying 
that the A, B and C matrices and the a-vector have zero elements, and Anl is not defined. 


A. 1.1 Constant Laser Source 

The simplest possible static component is given by single input/output coherent displace¬ 
ment with coherent amplitude r). This model is employed to realize static coherent input 
amplitudes. The D matrix is trivially given by D = (1) and the coherent amplitude is 
encoded in c = (rj). This leads to the desired input-output relationship /3 out = r/ + /3j n . For 
completeness we also provide the SLH [2Dj model ((1), ( 77 ), 0). 

A. 1.2 Static Phase Shifter 

The static single input/outputs phase shifter has D = (e l< ^) and c = (0), leading to an input 
output relationship of /3 out = Its SLH model is ((e*^), (0),0). 


A. 1.3 Beamsplitter 


The static beamsplitter mixes (at least) two input fields and can be parametrized by a 

cos 0 — sin$\ , 


mixing angle 9. It has D = 
relationship 


sin 9 cos 9 


and c = (0,0). This leads to an input output 


fPout,i\ _ (cos 9 — sin #\ fPin,i\ 

\/3out,2j Vsin 9 cos 9 J V/3m,2y 
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( 8 ) 


Its SLH model is ( ( C ° S 6 Sm 0> ) , 

yysmtl cos 6 J yO 

A.2 Resonator Models 

We consider resonator models with m internal modes and n external inputs and outputs. 
We assume for simplicity that a = 0 and c = 0 meaning that we will model all coherent 
displacements explicitly in the fashion described above. We also assume that their scattering 
matrices are trivially given by D = l n which means that far off-resonant input fields are 
simply reflected without a phase shift. Furthermore, none of our assumed models feature 
linear coupling between the internal cavity modes. This implies that the A-matrix is always 
diagonal. We are always working in a rotating frame. 



A.2.1 Single mode Kerr-nonlinear Resonator 

A Kerr-nonlinearity is modeled by the nonlinear term A^£ rr (a) = —ix|a| 2 a which can be 
understood as an intensity dependent detuning. The A-matrix is given by ^ — *A), its 
R-matrix is — (y/izi, y/tZ 2 , ■ ■ ■, where the total line width is given by J2'}=i K j = K T 

and the cavity detuning from any external drive is given by A. The C-matrix is given by 
C = —B t . The corresponding SLH model is 


/ y/Kia' 


1 ni 


, A a) a + ^a 2 ^a 2 


K n .a, 


(9) 


where the detuning differs slightly A = A + % as can be shown in the derivation of the 
Wigner-formalism. m 

The special case of a single mirror with coupling rate n and negligible internal losses is of 
interest for construsting the phase sensitive amplifier described in Section 2.1 Considering 


again an input given by a large static bias and a small signal e = ^(eo + Se), the steady 
state reflected amplitude is to first order 


e 


/ 


1 


[qe 0 + g-(e 0 )Se + g + (e 0 )5e*}. 


( 10 ) 


For negligible internal losses we can give provide exact expressions for rj,g+ and g_. 
Rather than parametrizing these by the bias eo we parametrize them by the mean coherent 
intra-cavity amplitude ao- When the system is not bi-stable (see below) relationship (20) 
defines a one-to-one map between eo and ao- 


V = 


k/ 2 - i( A + x|ap| 2 ) 
k /2 + *(A + xl a o| 2 ) 


=> \v\ = 1 , 


g- = 

9 + = 

eo = 


1 + 


K [-§ + *A + 2ix|a 0 | 2 ] 

(f) 2 + (A + 2x|a 0 | 2 ) 2 - |x| 2 |ao| 4 ’ 


o 


(f) 2 + (A + 2x|a 0 | 2 ) 2 - |x| 2 |ao| 4 ’ 
[77 +*( A + aM 2 )] ao- 


( 11 ) 

( 12 ) 

(13) 

(14) 
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The Kerr cavity exhibits bistability for a particular interval of bias amplitudes if and 
only if A/x < 0 and |A| > = A t h- 

At any fixed bias amplitude and corresponding internal steady state mode amplitude 
the maximal gain experienced by a small signal is given by g m3,x = |< 7 _| + |< 7 +|. Here 
maximal means that we maximize over all possible signal input phases relative to the bias 
input. To experience this gain, the signal has to be in an appropriate quadrature defined 
by argde = dlg,f/ ~ .-, dlg ' /+ , The orthogonal quadrature is then maximally de-amplified by a 
gain of ||< 7 _| — |< 7 +|| and it is possible to show that for negligible losses the perfect squeezing 
relationship (\g~\ + |g+|) ||<?-| — |g+|| = jlff-l 2 — |</+| 2 = 1 holds for any bias amplitude. 
Furthermore, for fixed cavity parameters g max is maximized at a particular non-zero intra¬ 
cavity photon amplitude 


i«r x i 2 


=> 9 


max 



v7 + k 

Vf-K 


, with / 


28A 2 + 4k 2 - 8A\/l2A 2 + 3 k 2 . 


(15) 

(16) 


Note that the maximal gain does not depend on the strength of the non-linearity, 
relationship between g max and A can be inverted: 


A = 


V~3k (S““ - (s"“ - !j) 


^max2 _ 2 


The 


(17) 


Using all this it is straightforward to construct a tunable Kerr-amplifier. The symmetric 
construction proposed in Section 2T provides the additional advantage that one does not 
have to cancel the scattered bias. It is also convenient to prepend and append phase shifters 
to the signal input and output that ensure g~ = g+ = g max / 2 at maximum gain. 

The quadrature filter construction relies on the presence of additional cavity losses that 
are equal to the input coupler K 2 = k,\ = k. In this case the gain coefficients for reflection 
of the first port are given by 


_ n[-K + iA + 2ix\a 0 \ 2 } 

k 2 + (A + 2x|a 0 | 2 ) 2 - |x| 2 |a 0 | 4 ’ 

(18) 

i^X^o 

k 2 + (A + 2x|«o| 2 ) 2 ~ |x| 2 |ao| 4 ’ 

(19) 

eo = ~—j= [« + *(A + ix|ao| 2 )] «o- 

V Av 

(20) 


and one may easily verify that for dynamic resonance, i.e., x|ao| 2 = —A, the gain coefficients 
are equal in magnitude \g~ \ = |g+| which implies that there exists an input phase for which 
the reflected signal vanishes. 


A.2.2 Two mode Kerr-nonlinear resonator 


We label the mode amplitudes as ot\ and a 2 . In this case the nonlinearity includes a cross¬ 
mode induced detuning 


a Kerr2 
^NL 


(a) 


f-iXaWl^Ot! - iXab\ot2\ 2 Otl\ 
\-iXab\ai\ 2 a 2 - ix&|«2| 2 «2 ) 


( 21 ) 
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The model matrices are 


A = 

B = 
C = 


.1<L _ » A 

2 ll-Iq 

0 


_ iAb 


/ V'V -y/^a,2 • • • \J^a,r 

- V 0 0 ... 0 

B T , 



and the corresponding SLH model is 


^1 n a+nb ,C \ , K a a ] a + K h b ] b + 




^ ^ i A ab^ 


( 22 ) 

(23) 

(24) 


(25) 


with K a / h = A a / fe + Xa/b + ^ and where the Wigner-correspondenc^] is (cti)w = (a), 

(a 2 )w = (&)• 

We briefly summarize how to construct a controlled phase shifter using an ideal two¬ 
mode Kerr cavity with a single input coupling to each mode and negligible additional 
internal losses. We exploit that in this case the reflected steady state signal amplitude £' is 
identical to the input amplitude ( up to a power dependent phase shift 

f -i(A a + ixa\a 0 \ 2 + iXab\Po\ 2 ) 1 ^, = ,.- f9n 

+ * (A a + iXal^ol 2 + iXab\Po\ 2 ) 

We assume that the control input amplitude takes on two discrete values ( = 0 or ( = (o 
and that variations of the signal input amplitude are small |C| ~ |Co|- In this case a good 
choice of detunings and coupling rates is given by 


A b 

Co 


Kfl _ 2Xa|Co| 2 

2 K, a 

KaXb _ 2Xafe|Co| 2 
Xab Kj a 

2 \AXab\ 


in addition to two inequality constraints 


Aa< V3y 

A b <V3| 


(27) 

(28) 

(29) 

(30) 

(31) 


that ensure that the system is stable. This construction ensures that 


C'l f =. 


C'lf=o 


— = — 1 and in 


fact it can easily be generalized to the more realistic case of non-negligible internal losses. 

Finally note that the inequality constraints imply that the lower bounds for the input 
couplings scale as k™ 111 oc |Co| which is important for our power analysis in Section 


3.1 


This, in turn implies that £o oc | Co I which is a fairly intuitive result. 

Tn this appendix we denote expectations with respect to the Wigner function as (-)w and quantum 
mechanical expectations as (■)■ 
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The controlled phase shifter can now be included in one arm of a Mach-Zehnder inter¬ 
ferometer to create a Fredkin gate (cf. Section 2.4). 

To realize a thresholder, the control mode input is prepended with a two port Kerr- 
cavity with parameters chosen such that it becomes dynamically resonant with maximal 
differential transmission gain close to where its output gives the correct high control input 
£o- 


Overall, we remark that even when we account for the prepended cavity, the relationship 
c oc |Co| still holds, where c is the input to the thresholder. To see how the total decay rate 
of the thresholding cavity Kthresh scales consider first that to get maximum differential 
gain or contrast, we ought pick a detuning right at or below the Kerr stability threshold 
A ~ Ath = \Z3^thresh/2- 


We choose the maximum input amplitude such that it approximately achieves dynamic 
resonance within the prepended thresholding cavity. This occurs when A = —x|ao| 2 (cf. Ap¬ 


pendix 


A.2.1) and at an input amplitude of c oc 


1 

A 

\ ^thresh 

X 


OC ^thresh- 


A.2.3 NOPO model 


The NOPO model has consists of three modes, the signal and idler modes a s ,oti and the 
pump mode a p . We assume a triply resonant modej^] and that uj s + Ui = cu p , allowing for 
resonant conversion of pump photons into pairs of signal and idler photons and vice versa. 
The nonlinearity is given by 


a NOPO 

a nl 


(a) 


/ X a * a p \ 

Xa* s a p I 

\-Xa s ai) 


and the model matrices are 



C = -B T . 



/ \/k 0 0 \ 

B = - 0 vAc 0 , 

\ 0 0 y/Kpj 


Here, the SLH model is given by 


Mi) 



(32) 


(33) 

(34) 


(35) 


where now a, b and c correspond to a s , a,i and a p . 

A steady state analysis of the system driven only by a pump input amplitude e reveals 


that below a critical threshold 

2e 


< Hh = 




4x 


the system as a unique fixpoint with 


a s = oti = 0 and a p = —-4=. Above threshold |e| > e t h, the intra-cavity pump amplitude 


stays constant at the threshold value a p = — —= — an d the signal and idler mode 


4 It is possible to drop this resonance assumption for the pump. 
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obtain non-zero magnitude 


|a s | = |a*| = \j—^~ (H - eth)- (36) 

As an interesting consequence of the model’s symmetry there exists not a single above 
threshold state but a whole manifold of fixpoints parametrized by a correlated signal and 
idler phase 


OLs — 


Oli — 




eth)e^° 

(37) 

eth)e -tni4> 0 

(38) 


where the common phase (j )o is fixed by the pump input phase via 

a s ai = (|e| - e th ) (39) 

In particular, for e < 0 we have on = a*. Above threshold the system will rapidly converge 
to a fixpoint of well-defined phase 0. Without quantum shot noise <j) would remain constant. 
With noise, however, the system can freely diffuse along the manifold. When the pump bias 
input is sufficiently large compared to threshold and consequently there are many signal 
and idler photons present in the cavity at any given time (| | 2 1) one can analyze 

the dynamics along the manifold and of small orthogonal deviations from the manifold. 
In the symmetric case considered here where signal and idler have equal decay rates, the 
differential phase degree of freedom cj) = arg a »~ ar g as decouples from all other variables and 
approximately obeys the SDE 

d(j) = y/^dWt, dW 2 = dt 
k n 2 

^ 8|a s | 2 32e t h (|e| - eth) 

It is relatively straightforward to generalize these results to a less symmetric model with 
different signal and idler couplings and even non-zero detunings, but for a given nonlinearity 
the model considered here provides the smallest phase diffusion and thus the best analog 
memory. For a very thorough analysis of this model we refer to [2Tj . 


(40) 

(41) 


A.3 Composite component models 

Due to the scope of this article, we will refrain from including the full net lists for the 
composite component models in this article and instead publish them online at [37]. We 
remark that composing a photonic circuit from the above described non-linear photonic 
models is often complicated by the fact that the steady state input-output relationships 
are hard or even impossible to invert analytically. A systematic approach to optimizing 
component parameters would be highly desirable. 
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